Part-of-Speech Tagging of Portuguese Based on Variable Length Markov Chains
نویسندگان
چکیده
Abstra t. Tagging is the task of attributing to words in ontext in a text, their orresponding Part-of-Spee h (PoS) lass. In this work, we have employed Variable Length Markov Chains (VLMC) for tagging, in the hope of apturing long distan e dependen ies. We obtained one of the best PoS tagging of Portuguese, with a pre ision of 95.51%. More surprisingly, we did that with a total time of training and exe ution of less than 3 minutes for a orpus of almost 1 million words. However, long distan e dependen ies are not well aptured by the VLMC tagger, and we investigate the reasons and limitations of the use of VLMCs. Future resear hes in statisti al linguisti s regarding long range dependen ies should on entrate in other ways of solving this limitation.
منابع مشابه
Part - of - Speech Tagging Usinga Variable Memory Markov
We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to xed-length Markov models, which predict based on xed-length histories, variable memory Markov models dynamically adapt their history length based on the training data, and hence may use fewer parameters. In a test of a VMM based tagger on the Brown c...
متن کاملPart-of-Speech Tagging using a Variable Memory Markov Model
We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to fixed-length Markov models, which predict based on fixed-length histories, variable memory Markov models dynamically adapt their history length based on the training data, and hence may use fewer parameters. In a test of a VMM based tagger on the Bro...
متن کاملPart - of - Speech Tagging Usinga Variable Memory
We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to xed-length Markov models, which predict based on xed-length histories, variable memory Markov models dynamically adapt their history length based on the training data, and hence may use fewer parameters. In a test of a VMM based tagger on the Brown c...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملVariable-Length Markov Models and Ambiguous Words in Portuguese
Variable-Length Markov Chains (VLMCs) offer a way of modeling contexts longer than trigrams without suffering from data sparsity and state space complexity. However, in Historical Portuguese, two words show a high degree of ambiguity: que and a. The number of errors tagging these words corresponds to a quarter of the total errors made by a VLMCbased tagger. Moreover, these words seem to show tw...
متن کامل